Resources for comparing the speed and performance of medical autocoders
نویسنده
چکیده
BACKGROUND Concept indexing is a popular method for characterizing medical text, and is one of the most important early steps in many data mining efforts. Concept indexing differs from simple word or phrase indexing because concepts are typically represented by a nomenclature code that binds a medical concept to all equivalent representations. A concept search on the term renal cell carcinoma would be expected to find occurrences of hypernephroma, and renal carcinoma (concept equivalents). The purpose of this study is to provide freely available resources to compare speed and performance among different autocoders. These tools consist of: 1) a public domain autocoder written in Perl (a free and open source programming language that installs on any operating system); 2) a nomenclature database derived from the unencumbered subset of the publicly available Unified Medical Language System; 3) a large corpus of autocoded output derived from a publicly available medical text. METHODS A simple lexical autocoder was written that parses plain-text into a listing of all 1,2,3, and 4-word strings contained in text, assigning a nomenclature code for text strings that match terms in the nomenclature. The nomenclature used is the unencumbered subset of the 2003 Unified Medical Language System (UMLS). The unencumbered subset of UMLS was reduced to exclude homonymous one-word terms and proper names, resulting in a term/code data dictionary containing about a half million medical terms. The Online Mendelian Inheritance in Man (OMIM), a 92+ Megabyte publicly available medical opus, was used as sample medical text for the autocoder. RESULTS The autocoding Perl script is remarkably short, consisting of just 38 command lines. The 92+ Megabyte OMIM file was completely autocoded in 869 seconds on a 2.4 GHz processor (less than 10 seconds per Megabyte of text). The autocoded output file (9,540,442 bytes) contains 367,963 coded terms from OMIM and is distributed with this manuscript. CONCLUSIONS A public domain Perl script is provided that can parse through plain-text files of any length, matching concepts against an external nomenclature. The script and associated files can be used freely to compare the speed and performance of autocoding software.
منابع مشابه
Evaluation of Three-dimensional Treatment Planning System (TPS) performance in dose calculation of virtual wedged fields using film dosimetry
Introduction: Nowadays radiotherapy plays an important role in cancer treatment. Different radiotherapy techniques improvement emphasizes on using of the precise ، appropriate and useful algorithms. one of these techniques are wedged which is used in radiotherapy to compensate missing tissues and create a uniform dose distribution in tissues. The Siemens Artiste linear accelera...
متن کاملComparing the Efficiency of Electronic Learning and Workshop Learning on Knowledge and Performance of Nursing Students in Controlling Nosocomial Infections
Background Being familiar with new teaching methods and comparing their result helps teachers achieve better planning for applying such methods in the future. This study is aimed on comparing the efficiency of electronic learning and workshop on knowledge and performance of nursing students in controlling nosocomial infections. Methods Two groups were selected by pre-test post-test method. Stud...
متن کاملروشی جدید برای تخمین همزمان تاخیر و داپلر از تابع ابهام: تلفیق فرآیندهای تصادفی و پردازش های مکانی برای حذف کلاتر و نویز
In this paper a new method is introduced for jointly delay and doppler estimation in ambiguity function based radars. In this method firstly each cell of ambiguity function is considered as a random variable, then an stochastic processes is estimated for each cell based on its value during consecutive radar scans. In the second step the ambiguity function is divided to high probability target a...
متن کاملModeling and Simulation of Variable Frequency Pump Control Fatigue Test Machine
High-speed maglev train is considered an ideal vehicle in the 21st Century, as an important part of the train, the travel mechanism bears and delivers a variety of vertical and horizontal alternating load in operation, it affects the operation safety of the train directly, so key components of the travel mechanism should under fatigue strength test by fatigue test machine.The paper proposed a v...
متن کاملEfficiency Evaluation of Medical Diagnostic Laboratories Using Data Envelopment Analysis in Isfahan, Iran
Abstract Background and Objective: Multi-criteria comparison between laboratories is important for laboratory management to improve performance and for policymakers to make strategic decisions. In this study, those aspects of performance are considered that are beyond the traditional evaluation carried out by checklist. Material and Methods: After the identifying the effective m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- BMC Medical Informatics and Decision Making
دوره 4 شماره
صفحات -
تاریخ انتشار 2004